{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "26dd9d71",
   "metadata": {},
   "source": [
    "# Example: Browse Offline Evaluations\n",
    "\n",
    "In this notebook, we want to explore how evaluations can be viewed locally after downloading them via `tsbench evaluations download`.\n",
    "\n",
    "Whether evaluations are downloaded from your own AWS Sagemaker experiment (via passing the `--experiment` flag to the download command) or you use the [publicly available evaluations](https://registry.opendata.aws/tsbench/) is irrelevant.\n",
    "\n",
    "Note that, by default, the command only downloads the metrics obtained. This ensures that you only need to download ~20 MiB of data and should suffice for most use cases. If you need the actual forecasts (e.g. to build ensembles), you must pass the `--include_forecasts` flag. These amount to almost 600 GiB of data."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3f15aa9",
   "metadata": {},
   "source": [
    "## Initialize the Tracker\n",
    "\n",
    "The tracker is responsible for accessing the evaluations that are available offline. Whenever you want to access evaluations, you should **only** use this class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "af4ebe31",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "from tsbench.evaluations.tracking import ModelTracker"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b9324dc",
   "metadata": {},
   "source": [
    "When the tracker is initialized for the first time, it loads the performance metrics obtained from all evaluations. For the publicly available data, this takes roughly 7 seconds. Afterwards, the tracker will be cached.\n",
    "\n",
    "_**Note:** If you modify the code for anything related to the tracker, you will need to delete the cache which is located at `~/.cache/tsbench`._"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5bd80c78",
   "metadata": {},
   "outputs": [],
   "source": [
    "tracker = ModelTracker.from_directory(Path.home() / \"evaluations\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1542be6",
   "metadata": {},
   "source": [
    "## Aggregate Metrics\n",
    "\n",
    "At first, we want to have a look at all available offline evaluations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a3bcd90f",
   "metadata": {},
   "outputs": [],
   "source": [
    "evaluations = tracker.get_evaluations()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "77d8a353",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = evaluations.dataframe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3b8e1e4",
   "metadata": {},
   "source": [
    "The data frame provides us with a mapping from datasets and model configurations to performance metrics. For example, to obtain the best model configuration in terms of nCRPS on the `m4_monthly` dataset, we can run the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "bee378d4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "83d929e9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Best nCRPS: 0.0920\n",
      "Best Model: ('m4_monthly', 'deepar', 1.0, 0.001, 2.0, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 2.0, 40.0)\n"
     ]
    }
   ],
   "source": [
    "dataset_evaluations = df.query(\"dataset == 'm4_monthly'\")\n",
    "print(f\"Best nCRPS: {dataset_evaluations.ncrps_mean.min():.4f}\")\n",
    "\n",
    "i = np.argmin(dataset_evaluations.ncrps_mean)\n",
    "print(f\"Best Model: {dataset_evaluations.index[i]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f05e1e8e",
   "metadata": {},
   "source": [
    "This model representation makes more sense when looking at the names of the index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "8edb44a4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "FrozenList(['dataset', 'model', 'model_training_fraction', 'model_learning_rate', 'model_context_length_multiple', 'model_mqcnn_num_filters', 'model_mqcnn_kernel_size_first', 'model_mqcnn_kernel_size_hidden', 'model_mqcnn_kernel_size_last', 'model_tft_hidden_dim', 'model_tft_num_heads', 'model_simple_feedforward_hidden_dim', 'model_simple_feedforward_num_layers', 'model_nbeats_num_stacks', 'model_nbeats_num_blocks', 'model_deepar_num_layers', 'model_deepar_num_cells'])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index.names"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f04423f4",
   "metadata": {},
   "source": [
    "Essentially, it boils down to the following configuration:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "18eed250",
   "metadata": {},
   "outputs": [],
   "source": [
    "from tsbench.config.model.models import DeepARModelConfig"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "8c188ca1",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_config = DeepARModelConfig(context_length_multiple=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a37ce41c",
   "metadata": {},
   "source": [
    "## Obtain Metrics from the Tracker\n",
    "\n",
    "For a particular combination of model configuration and dataset, it is very easy to obtain the performance metrics, the forecasts, and plenty of additional information. Consult the documentation of the `ModelTracker` class for all possible operations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "8d1b8984",
   "metadata": {},
   "outputs": [],
   "source": [
    "from tsbench.config import DATASET_REGISTRY, Config"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "4819f9d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "config = Config(model_config, DATASET_REGISTRY[\"m4_monthly\"]())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "f7485519",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Performance(training_time=Metric(mean=24000.048828125, std=4799.99609375), latency=Metric(mean=0.008198121096938848, std=8.529191836714745e-05), num_model_parameters=Metric(mean=23164.0, std=0.0), num_gradient_updates=Metric(mean=349418.5, std=62984.5), ncrps=Metric(mean=0.09203928336501122, std=0.00035720691084861755), mase=Metric(mean=0.9362838566303253, std=0.006132692098617554), smape=Metric(mean=0.12887728214263916, std=0.001025184988975525), nrmse=Metric(mean=0.2828601896762848, std=0.0015260577201843262), nd=Metric(mean=0.11401168629527092, std=0.0006160400807857513))"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tracker.get_performance(config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "197615fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "forecasts = tracker.get_forecasts(config)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41ef37e8",
   "metadata": {},
   "source": [
    "The forecasts are an array of generated forecasts for the provided configuration (since the same model configuration might have been evaluated on the same dataset for multiple seeds)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "5d08cb24",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['0.1', '0.2', '0.3', '0.4', '0.5', '0.6', '0.7', '0.8', '0.9']"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "forecasts[0].quantiles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "2f92ae8a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(48000, 9, 18)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "forecasts[0].values.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8786ef9",
   "metadata": {},
   "source": [
    "A single forecast object then provides the forecasts for all time series (in this case: 48,000) across all evaluated quantiles (10-quantiles) and the entire forecast horizon (in this case: 18 steps)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "81dd5cbe",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
